Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

نویسندگان

Reza Mahjourian

Martin Wicke

Anelia Angelova

چکیده

We present a novel approach for unsupervised learning of depth and ego-motion from monocular video. Unsupervised learning removes the need for separate supervisory signals (depth or ego-motion ground truth, or multi-view video). Prior work in unsupervised depth learning uses pixel-wise or gradient-based losses, which only consider pixels in small local neighborhoods. Our main contribution is to explicitly consider the inferred 3D geometry of the scene, enforcing consistency of the estimated 3D point clouds and ego-motion across consecutive frames. This is a challenging task and is solved by a novel (approximate) backpropagation algorithm for aligning 3D structures. We combine this novel 3D-based loss with 2D losses based on photometric quality of frame reconstructions using estimated depth and ego-motion from adjacent frames. We also incorporate validity masks to avoid penalizing areas in which no useful information exists. We test our algorithm on the KITTI dataset and on a video dataset captured on an uncalibrated mobile phone camera. Our proposed approach consistently improves depth estimates on both datasets, and outperforms the stateof-the-art for both depth and ego-motion. Because we only require a simple video, learning depth and ego-motion on large and varied datasets becomes possible. We demonstrate this by training on the low quality uncalibrated video dataset and evaluating on KITTI, ranking among top performing prior methods which are trained on KITTI itself.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Machine Learning Approach to Recovery of Scene Geometry from Images

Recovering the 3D structure of the scene from images yields useful information for tasks such as shape and scene recognition, object detection, or motion planning and object grasping in robotics. In this thesis, we introduce a general machine learning approach called unsupervised CRF learning based on maximizing the conditional likelihood. We describe the application of our machine learning app...

متن کامل

Video Subject Inpainting: A Posture-Based Method

Despite recent advances in video inpainting techniques, reconstructing large missing regions of a moving subject while its scale changes remains an elusive goal. In this paper, we have introduced a scale-change invariant method for large missing regions to tackle this problem. Using this framework, first the moving foreground is separated from the background and its scale is equalized. Then, a ...

متن کامل

Self-Supervised Depth Learning for Urban Scene Understanding

As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth.1 It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, far away mountains don’t move much; nearby trees move a lot. This natural relationship between the appearance of objects and their mot...

متن کامل

Application of 3D Human Motion in the Sports based on Computer Aided Analysis

3D human motion tracking is in recent years the field of machine vision is a very important research direction, it has a wide range of applications, such as human-computer interaction, intelligent animation, video surveillance and other. Currently about three-dimensional human motion tracking research mostly based multicast video, monocular video due to the depth information of the lack of the ...

متن کامل

Covariance Scaled Sampling for Monocular 3D Body Tracking

We present a method for recovering 3D human body motion from monocular video sequences using robust image matching, joint limits and non-self-intersection constraints, and a new sample-andrefine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: for reliable tracking at least 30 joint parameters need to be estimated, subject to highly nonlin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.05522 شماره

صفحات -

تاریخ انتشار 2018

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

نویسندگان

چکیده

منابع مشابه

A Machine Learning Approach to Recovery of Scene Geometry from Images

Video Subject Inpainting: A Posture-Based Method

Self-Supervised Depth Learning for Urban Scene Understanding

Application of 3D Human Motion in the Sports based on Computer Aided Analysis

Covariance Scaled Sampling for Monocular 3D Body Tracking

عنوان ژورنال:

اشتراک گذاری